Group 12 - PCOS Analysis

Agnes Lorenzen, Cecille Hobbs, Freja E. Klippmann, Julie Dalgaard Petersen & Mille Rask Sander

Introduction

Background

  • Polycystic ovary syndrome (PCOS) is a syndrome documented in women in their menstruating ages

  • Documented symptoms are often; period pains, irregular periods, ovary-related problems and hormone imbalance

  • Patients with PCOS often have problems with fertility and potential pregnancy complications

  • However, the cause of PCOS is still not verified, and diagnosis is complicated

  • The data set has been made in India and data comes from 10 different hospitals.

Aim

The aim of this study is to examine a data set of patients with and without PCOS to identify potential biomarkers.

Data handling approach

  • Raw data:
    541 observations divided into 45 variables

  • 01_load_data:
    Simply loads the data

  • 02_clean_data:

    • Fixing random cells and replacing them with NA
    • Rename & factorizing columns
    • Split dataframe into body and blood measurements
    • Removed empty column
  • 03_augment:

    • Unit changes (inch to cm)
    • Rounding & grouping BMI
    • Change blood type and cycles from numeric values to characters
    • Create new column for cycle/ pregnancy stage
    • Merging data frame into one file
# Rounding of BMI and dividing into categories
body_measurements <- body_measurements |>
  mutate(BMI = round(BMI, 1)) |> 
  mutate(BMI_class = case_when(
    BMI < 18.5 ~ "Underweight",
    BMI <= 18.5 | BMI < 25 ~ "Normal weight",
    BMI <= 25 | BMI < 30 ~ "Overweight",
    BMI >= 30 ~ "Obesity")) |>
  mutate(BMI_class = factor(BMI_class,
                            levels =  c("Underweight", 
                                        "Normal weight",
                                        "Overweight", 
                                        "Obesity"))) |>
  relocate(BMI_class, .after = BMI)

Descriptive analysis of data

Dimensions:

# A tibble: 2 × 1
  `PCOS dimensions`
              <int>
1               541
2                44

Count of how many have PCOS:

# A tibble: 2 × 2
  PCOS_diagnosis     n
  <chr>          <int>
1 No               364
2 Yes              177

Descriptive analysis of data

PCA of blood measurements

No diverging of PCOS diagnosed individuals compared to non-PCOS diagnosed individuals

Body measurement data analysis

Follicle number and PCOS diagnosis:

  • Regression showed significance

PCA of body measurements

Slight divergence of PCOS and non-PCOS in body measurements

Logistic regression


Call:
glm(formula = PCOS_diagnosis ~ ., family = "binomial", data = data_model)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -4.40301    0.74872  -5.881 4.09e-09 ***
follicle_no_R  0.20050    0.04944   4.055 5.01e-05 ***
follicle_no_L  0.34649    0.04944   7.009 2.40e-12 ***
avg_fsize_R   -0.02728    0.05084  -0.537    0.592    
avg_fsize_L    0.01160    0.05199   0.223    0.824    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 683.99  on 540  degrees of freedom
Residual deviance: 399.10  on 536  degrees of freedom
AIC: 409.1

Number of Fisher Scoring iterations: 5

Logistic regression of follicle numbers as a biomarker

Discussion and Conclusion

  • Blood measurements don’t show significance

    • Blood biomarker for PCOS diagnosis - not recommended based on this data

    • A limitation of the data is that it does not explicitly tell where the women are in their cycle

  • Body measurements show significance for left and right follicle numbers

    • High follicle number could potentially be a biomarker for PCOS diagnosis

      • This aligns well with the use of ultrasound
  • Imbalanced dataset between women with an without PCOS - more women without PCOS present

  • Not an optimal data set for significant conclusions